Predicting Upcoming Board Games

Predictive Models for BoardGameGeek Ratings

Author

Phil Henrickson

Published

3/2/25

Pipeline

I use historical data from BoardGameGeek (BGG) to train a number of predictive models for boardgames. I first classify games based on their probability of achieving a minimum number of ratings on BGG. I then estimate each game’s complexity (average weight) in order to predicts its number of user ratings and average rating. I then use these estimates to compute the expected Geek Rating.

The following (somewhat messy) visualization displays the status of the current pipeline used to train models and predict new games.

Assessment

How did the models perform in predicting games?

I used a training-validation approach based around the year in which games were published. I creating a training set of games published prior to 2022 and evaluated its performance in predicting games published from 2022 to 2023.

I used a training-validation approach based around the year in which games were published. I creating a training set of games published prior to 2022 and evaluated its performance in predicting games published from 2022 to 2023.

BGG Ratings

minratings model outcome rmse mae mape rsq ccc
25 glmnet average 0.674 0.498 7.363 0.295 0.488
25 lightgbm averageweight 0.436 0.336 18.117 0.708 0.827
25 glmnet+glmnet bayesaverage 0.284 0.159 2.646 0.430 0.648
25 glmnet usersrated 1925.712 445.795 154.937 0.123 0.337

Predictions for games in the validation set.

Hurdle

I first predict whether games are expected to receive enough ratings to be assigned a geek rating (25 ratings). This is a classification model which assigns a probability to a game; in order to classify games, I need to determine the appropriate threshold

I select this threshold by examining performance across a variety of classification metrics. I select the threshold that maximizes the (F2 measure) in order to minimize false negatives, as I’m interested in using the hurdle model to filter out games that are very unlikely to receive ratings, where including a game that is worse than missing a game.

outcome .metric .estimator .estimate
hurdle bal_accuracy binary 0.773
hurdle kap binary 0.456
hurdle mcc binary 0.494
hurdle f1_meas binary 0.650
hurdle f2_meas binary 0.764
hurdle precision binary 0.520
hurdle recall binary 0.866
hurdle j_index binary 0.546
hurdle roc_auc binary 0.861
hurdle pr_auc binary 0.737

Features

Which features were influential for predicting each outcome?

Predictions

Upcoming Games

The following table displays predicted BGG outcomes for games that are expected to achieve at least 25 user ratings.

Hurdle

This table displays predicted probabilities for whether games will achieve enough ratings (25) to be assigned a Geek Rating.